Species distribution models (SDMs) are used to predict the habitat range of different organisms. They offer some of the most informative ways to show how shifting environmental conditions affect where species occur in an area.
Yet species rarely stay in the same spot for long periods of time. Just like us, they react to changes in their environment, interactions with other species, and interactions with other individuals. As a result, it can be very useful to see how a distribution of a species changes in space and over time. In marine environments, for example, seemingly small changes in temperature, chemicals and light can result in large changes to a species’ distribution.
Here we will use a series of SDMs to predict the distribution of Nudibranchia around Australia each month to create an animated map that shows how nudibranch distributions change over the year.
This post is inspired by Liam Bailey’s cool (and hilarious) Bigfoot distribution map. You can find his code here.
A brief intro to species distribution models
At their simplest, SDMs are statistical models that use information of where a species has occurred to predict where it occurs over a broader area, even in places it hasn’t been seen yet!
SDMs are particularly informative because they can use information about the environment, added as predictor variables, to help calculate where a species is most likely might live. Common environmental predictor variables include temperature, rainfall, chemical concentrations and even locations of other organisms.
This means that SDMs are usually given 2 main inputs:
- Occurrence data
- Environmental variables
Using these inputs, our species distribution model can infer the probability of where a species occurs by combining information about where and under what environmental conditions a species is observed (or not observed) in our data.
Download data
Occurrence data
Let’s first download observations of Nudibranchia across Australia.
Left: Doriprismatica atromarginata (diana88jingfung CC-BY-NC 4.0 (Int)) Middle: Ceratosoma amoenum (Erik Schlogl CC-BY-NC 4.0 (Int)) Right: Pteraeolidia ianthina (Jallitt CC-BY-NC 4.0 (Int))
We’ll load the necessary packages.
library(galah)
library(tidyverse)
library(glue)
library(lubridate)
library(stars) # Raster management
library(ozmaps) # Australian map
library(SSDM) # Linear modelling
library(sdmpredictors) # Environmental variables
library(grDevices) # Colours and fonts
library(maps) # Cities for map
library(tmaptools) # Create plot ratio
library(gifski) # Create GIF
library(knitr) # View GIFNow we will use the {galah} package to download observations of Nudibranchia.
You will need to provide a registered email with the ALA to galah_config() before retrieving records.
# Add registered email (register at ala.org.au)
galah_config(email = "your-email@email.com")# Download observations
nudibranch_occurrences <-
galah_call() |>
galah_identify("Nudibranchia") |>
galah_filter(country == "Australia") |>
galah_apply_profile(ALA) |> # ALA's set of data cleaning filters
atlas_occurrences() Environmental Variables
Now we will download our environmental variables for our model.
For our Nudibranchia model, we will use 4 common marine environmental variables:
- Sea surface temperature
- Sea surface salinity
- Distance to shore
- Bathemetry
To get them, we’ll use load_layers() from the {sdmpredictors} package to download our variables as raster layers (geographic layers that have a value per pixel of our variable). We’ll use the rasterstack argument to combine our layers into one object.
The {sdmpredictors} package has lots of data sets and layers available. Check out their website to learn more.
# Download variables
env <- load_layers(layercodes = c("MS_biogeo08_sss_mean_5m",
"MS_biogeo13_sst_mean_5m",
"MS_biogeo05_dist_shore_5m",
"MS_bathy_5m"),
equalarea = FALSE,
rasterstack = TRUE)To prepare variable data for our model, we need to crop the geographical boundaries of our data to include only the coast (and surrounding ocean) of Australia. With the help of the {raster} package, we’ll use extent() to set the outer boundaries and crop() to remove the land.
# Create extent
aus_ext <- raster::extent(100, 165, -45, -10)
# Limit environmental variables
aus_env <- raster::crop(env, aus_ext)
# Check variables
plot(aus_env)Prepare data
To construct our animated GIF, we can make each “frame” of our animation a species distribution map of each month - that means 12 maps, January to December.
In order to do this, we’ll create a set of custom functions that can:
- Filter our nudibranch observations into a smaller
data.framecontaining observations for a specific month, - Run an SDMs on data in our smaller
data.frame - Plot the SDM results onto a map
- Save the map with a nice name
By making custom functions for these tasks, we’ll be able to run each function in a loop so we can do each thing 12 times for each of our 12 months.
At the end, we’ll stitch our 12 maps together and, Voila! We’ll officially be animators (Pixar here we come!).
First we’ll need to make a few data wrangling steps to clean and organise our data:
- filter out
NAvalues and duplicates (which might cause our model to error) - extract month of observation into its own column with the
month()function - select month, latitude, longitude columns of each observation
# Clean, filter and convert time series to months
occurrences_clean <-
nudibranch_occurrences |>
filter(!is.na(decimalLatitude) & !is.na(decimalLongitude)) |>
filter(!duplicated(decimalLatitude) & !duplicated(decimalLongitude)) |>
mutate(month = month(eventDate)) |>
select(month,
latitude = decimalLatitude,
longitude = decimalLongitude)
head(occurrences_clean)# A tibble: 6 × 3
month latitude longitude
<dbl> <dbl> <dbl>
1 12 -55.1 159.
2 3 -54.5 159.
3 NA -52.4 71.9
4 1 -45.1 146.
5 1 -44.3 147.
6 1 -44.3 147.
From here, let’s create 12 separate data.frames containing observations for each month.
To help, we’ll make our own function make_months_df() that filters our observations in occurrences_clean to our chosen month and selects the latitude and longitude columns:
# Build function (for each month select the lat and long)
make_months_df <- function(chosen_month) {
monthly_data <- occurrences_clean %>%
filter(month == {{chosen_month}}) %>%
select(latitude, longitude)
}Because each month is assigned as a number, with the help of purrr::map() we can run a loop over our make_months_df() function by specifying which month number we want data for.
Let’s create a vector containing numbers 1-12 (n_months) which we’ll loop over our function to return our 12 data.frames as a list (month_list).
n_months <- c(1:12)
month_list <- purrr::map(n_months, make_months_df)
month_list[[1]] # See output of month 1# A tibble: 1,558 × 2
latitude longitude
<dbl> <dbl>
1 -45.1 146.
2 -44.3 147.
3 -44.3 147.
4 -43.1 148.
5 -43.1 147.
6 -43.0 148.
7 -42.6 148.
8 -41.9 148.
9 -41.2 146.
10 -41.1 146.
# … with 1,548 more rows
Species Distribution Model
Now that month_list contains our 12 data.frames, we’re ready to make our SDM.
To build our overall SDM we will run 3 different statistical models. Each model type makes different assumptions about the data and its variation, and without getting bogged down in statistics, this just means that each model will calculate slightly different probabilities of where nudibranchs occur because of these different assumptions.
We’ll then merge the 3 probabilities into one final value using Fisher’s combined probability. Merging multiple model results will help to balance our final value between our 3 model’s results so that it is less skewed by our model choice, hopefully giving us a more conservative final estimate.
We’ll build another custom function to run these models for each data.frame in month_list. We’ll call this function run_sdm_model, and all it does is run the exact same SDM as we did earlier, but uses the chosen month for occurrences rather than the entire data set.
We’ve chosen to use the model used by Liam Bailey in his Bigfooot map. It’s a fairly flexible model that suits our purposes to quickly see where nudibranchs are observed around Australia. But it is by no means the most robust SDM (and would not hold up through a more rigourous peer-review process). If you are trying to make a more informative species distribution model, you should consider another method!
# Create a function that calculates the chi squared value
Chi_sq <- function(x){
1 - pchisq(q = x, df = 6)
}
# Species distribution model function
run_sdm_model <- function(chosen_month) {
SDM_GLM <- modelling("GLM",
Occurrences = (chosen_month),
Env = aus_env,
Xcol = 'longitude',
Ycol = 'latitude',
verbose = FALSE)
SDM_MARS <- modelling("MARS",
Occurrences = (chosen_month),
Env = aus_env,
Xcol = 'longitude',
Ycol = 'latitude',
verbose = FALSE)
SDM_CTA <- modelling("CTA",
Occurrences = (chosen_month),
Env = aus_env,
Xcol = 'longitude',
Ycol = 'latitude',
verbose = FALSE)
# Combine these to one value using Fisher's combined probability
combined <- -2 * (log(SDM_MARS@projection) + log(SDM_GLM@projection) + log(SDM_CTA@projection))
# Apply the function to our data
combined_pval <- raster::calc(combined, fun = Chi_sq)
# Convert to stars as it is easier to use with ggplot
species_distribution <- stars::st_as_stars(combined_pval)
return(species_distribution)
}Now we can use purrr::map() to run another loop to return results of our 12 SDMs.
# Run & save models
model_list <- purrr::map(month_list, run_sdm_model) Map
We now have the results of our 12 SDMs in model_list. We can use these results to make 12 maps.
To help orient ourselves, let’s download point data of the main cities in Australia to add to our maps.
# Download all cities in Australia
world_cities <- world.cities[world.cities$country.etc == "Australia",]
# Filter to the specific cities for the east and west coast
cities <- world_cities %>%
filter(name == "Sydney" | name == "Melbourne" | name == "Perth" |
name == "Brisbane" | name == "Cairns" | name == "Canberra" |
name == "Adelaide" | name == "Melbourne")Let’s also choose a colour palette for our map. To make our map easier to interpret, we’ve elected to make lower values a dark blue shade (so they look like the ocean) and higher values a brighter yellow:
blue_yellow <- c( "#184E77", "#1E6091", "#168AAD", "#34A0A4", "#52B69A",
"#76C893", "#99D98C", "#B5E48C", "#D9ED92")
colour_palette <- colorRampPalette(blue_yellow)(50)
feathers::print_pal(colour_palette)Now we are ready to make maps of our results!
We’ll once again make a custom function make_the_map() to construct each map. This function contains a few things to suit our needs:
- Plots our SDM results with
geom_stars() - Adds each month’s 3-letter abbreviation
month_labelto the top of each map - Limits the map to the east coast (there appears to more nudibranchs there, and zooming in will help us see the detail)
- Adds our nice colour palette with
scale_fill_gradientn()& a formats a custom legend withguide_colourbar() - Adds cities to our map
Let’s make some month labels:
# Get label for each month
month_label <- month(1:12, label = TRUE)
month_label [1] Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec
And now we’ll create our make_the_map() function:
# Map making function (for monthly SDM build this map)
make_the_map <- function(model_data, month_label) {
month <- {{month_label}}
ggplot() +
geom_stars(data = model_data) +
geom_sf(
data = ozmaps::ozmap_states,
colour = "#A9A793",
fill = "#C8C6AF") +
scale_fill_gradientn(
colours = c(colour_palette),
limits = c(0, 1),
guide = guide_colourbar(
title = "Occurrence\nprobability",
title.theme = element_text(
family = "Times New Roman",
colour = "#3D4040",
size = 10,
face = "bold"
),
label.theme = element_text(
colour = "#3D4040",
size = 8
),
ticks = FALSE,
frame.colour = "#3D4040",
title.position = "top",
title.vjust = 2,
label.position = "left"
),
breaks = c(0, 0.5, 1),
labels = c("0%", "50%", "100%")
) +
# Title map with month
labs(
title = glue("{month_label}")) +
theme_void() +
theme(
legend.position = c(1.2, 0.2), # reposition legend
legend.direction = "vertical",
plot.background = element_rect(fill = "#FFFFFF", colour = NA),
plot.margin = unit(c(0.01, 2.5, 0.1, 0.1), "cm")) +
# Zoom into East Coast
coord_sf(
crs = "WGS84",
xlim = c(135, 159),
ylim = c(-42.9, -11.5)
) +
# Add city points
geom_point(
data = cities,
aes(x = long, y = lat),
colour = "#3D4040") +
# Add city labels
geom_text(
data = cities,
aes(x = long, y = lat, label = name),
colour = "#3D4040",
nudge_x = -1.8,
nudge_y = 0.6,
family = "Times New Roman")
}We’ll also create custom save function that makes sure our maps are all saved as the same dimensions, gives them a file name we can use to order our maps correctly later, and saves them as a png in a new folder called maps.
To figure out the best aspect ratio, we’ll use the get_asp_ratio() function from the {tmaptools} package, and use that ratio to calculate the width of our map.
# Build a ratio based off of our distribution results
plot_ratio <- get_asp_ratio(model_list[[1]])
# Define letter to save map as (lets us import them in correct order for gif)
letter_id <- as.list(letters[1:12])
# Map-saving function
save_the_map <- function(chosen_map, id_label) {
map_label <- {{id_label}}
ggsave(chosen_map,
width = plot_ratio*4, height = 5,
file = glue("maps/map_{map_label}.png"),
device = "png")
}Now we can use map2() to run our custom functions in a loop:
# Make SDM maps and save each map
model_list %>%
purrr::map2(
.x = .,
.y = month_label,
.f = make_the_map
) %>%
purrr::map2(
.x = .,
.y = letter_id,
.f = save_the_map
)We should now have 12 species distribution maps saved, and we can see them by returning all the files in our maps folder!
list.files(path = "maps/") [1] "map_a.png" "map_b.png" "map_c.png" "map_d.png" "map_e.png" "map_f.png"
[7] "map_g.png" "map_h.png" "map_i.png" "map_j.png" "map_k.png" "map_l.png"
Let’s save our file names in an object so we can use them to make our GIF. We just have to use glue() to put the folder name on the front.
# Save file names
png_files <- list.files(path = "maps/")
map_paths <- glue("maps/{png_files}")Make GIF
For context before we see our animation, let’s first look at the distribution of all nudibranchs across Australia (this uses all the same code as above, but without all the looping)
Code
## Run the 3 models on our data
SDM_GLM <- modelling("GLM",
Occurrences = occurrences_clean,
Env = aus_env,
Xcol = 'longitude',
Ycol = 'latitude',
verbose = FALSE)
SDM_MARS <- modelling("MARS",
Occurrences = occurrences_clean,
Env = aus_env,
Xcol = 'longitude',
Ycol = 'latitude',
verbose = FALSE)
SDM_CTA <- modelling("CTA",
Occurrences = occurrences_clean,
Env = aus_env,
Xcol = 'longitude',
Ycol = 'latitude',
verbose = FALSE)
# Combine these to one value using Fisher's combined probability
combined <- -2 * (log(SDM_MARS@projection) + log(SDM_GLM@projection) + log(SDM_CTA@projection))
# Apply the function to our data
combined_pval <- raster::calc(combined, fun = Chi_sq)
# Convert to stars as it is easier to use with ggplot
species_distribution <- stars::st_as_stars(combined_pval)
## MAP
ggplot() +
geom_stars(data = species_distribution) + # Plot SDM results
geom_sf(data = ozmaps::ozmap_country, # Add Australian map
colour = "grey",
fill = "#C8C6AF") +
coord_sf(crs = "WGS84", # Set geographical boundaries
xlim = c(112, 154),
ylim = c(-43, -11)) +
scale_fill_gradientn(
colours = c(colour_palette), # Use custom palette
limits = c(0, 1),
guide = guide_colourbar(
title = "Occurrence probability", # title of legend
title.theme = element_text( # style legend title
family = "Times New Roman",
colour = "#B3B6B6",
face = "bold",
size = 12
),
label.theme = element_text( # style legend text
colour = "#B3B6B6",
size = 10
),
ticks = FALSE,
frame.colour = "#B3B6B6",
title.position = "top"
),
breaks = c(0, 0.5, 1),
labels = c("0%", "50%", "100%")
) +
theme_void() +
theme(
legend.position = c(0.2, 0.1),
legend.direction = "horizontal",
legend.key.size = unit(5, "mm"),
plot.background = element_rect(fill = "#F7F7F3", color = "#F16704"),
panel.border = element_rect(
color = "#FFFFFF",
fill = NA,
size = 2)
)Our map shows that there are nudibranchs along pretty much the entire coastline of Australia! Their distribution appears to spread slightly further from the coast in the northern half of Australia.
Now let’s finish our animation by sticking our 12 monthly maps together with the {gifski} package!
# Create animation
gifski(map_paths, gif_file = "SDM.gif", delay = 0.5)knitr::include_graphics("SDM.gif")We now have our animated GIF!
Our animation shows that nudibranchs can be observed all year long, though there are some months where you are more likely to observe nudibranchs in more places than others.
However, our animation also seems to have a few limitations:
- It’s hard to tell what time of the year nudibranchs are more likely to occur
Distributions predicted using more observations are less influenced by outliers. This is because our model has more information in these months to help make its predictions, which makes their results more robust. However, whether one month had more observations than another isn’t something you can easily tell from our animation (a distribution doesn’t necessarily get bigger when there are more observations, and smaller when there are fewer). The truth is, there are more observations of nudibranchs from October-January, and fewer from May-August. Without already knowing, we probably couldn’t tell this from our map, and so our map can be misleading for someone interpreting its results.
- Some distributions are blurry
Blurriness happens when our model is less able to make specific predictions of where nudibranchs are (i.e. greater model uncertainty). Uncertainty can grow either because there is less data or because data are more varied in a given month. Again, just from looking at our animation, it’s difficult to tell whether some months nudibranchs are distributed more widely, or whether there just wasn’t much data for that month.
In the end, our animation is a nice demonstration of how model uncertainty can make it hard to interpret results. When data are broken down into smaller and smaller groups (which often happens over the course of an entire analysis), we increase the chance of model uncertainty because there are fewer data points for our models to use. It’s good to be wary of this phenomenon so you can be careful and avoid over-interpretting your results (and can be wary if you see other people doing it elsewhere)!
Nonetheless, we hope you’ve felt the thrill of constructing your own stop-motion animation with {ggplot2} and {gifski}!
Expand for session info
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.2.2 (2022-10-31 ucrt)
os Windows 10 x64 (build 19044)
system x86_64, mingw32
ui RTerm
language (EN)
collate English_Australia.utf8
ctype English_Australia.utf8
tz Australia/Sydney
date 2023-03-07
pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind * 1.4-5 2016-07-21 [1] CRAN (R 4.2.0)
dplyr * 1.1.0 2023-01-29 [1] CRAN (R 4.2.2)
forcats * 0.5.2 2022-08-19 [1] CRAN (R 4.2.1)
galah * 1.5.1 2023-02-21 [1] Github (AtlasOfLivingAustralia/galah@bd43dd2)
ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.1)
gifski * 1.6.6-1 2022-04-05 [1] CRAN (R 4.2.1)
glue * 1.6.2 2022-02-24 [1] CRAN (R 4.2.1)
htmltools * 0.5.4 2022-12-07 [1] CRAN (R 4.2.2)
knitr * 1.40 2022-08-24 [1] CRAN (R 4.2.1)
lubridate * 1.8.0 2021-10-07 [1] CRAN (R 4.2.1)
maps * 3.4.0 2021-09-25 [1] CRAN (R 4.2.1)
ozmaps * 0.4.5 2021-08-03 [1] CRAN (R 4.2.1)
purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.2.1)
readr * 2.1.3 2022-10-01 [1] CRAN (R 4.2.2)
sdmpredictors * 0.2.13 2022-09-13 [1] CRAN (R 4.2.1)
sessioninfo * 1.2.2 2021-12-06 [1] CRAN (R 4.2.1)
sf * 1.0-8 2022-07-14 [1] CRAN (R 4.2.1)
SSDM * 0.2.8 2020-02-28 [1] CRAN (R 4.2.1)
stars * 0.5-6 2022-07-21 [1] CRAN (R 4.2.1)
stringr * 1.5.0 2022-12-02 [1] CRAN (R 4.2.2)
tibble * 3.1.8 2022-07-22 [1] CRAN (R 4.2.1)
tidyr * 1.2.0 2022-02-01 [1] CRAN (R 4.2.1)
tidyverse * 1.3.2 2022-07-18 [1] CRAN (R 4.2.1)
tmaptools * 3.1-1 2021-01-19 [1] CRAN (R 4.2.2)
[1] C:/Users/KEL329/R-packages
[2] C:/Users/KEL329/AppData/Local/Programs/R/R-4.2.2/library
──────────────────────────────────────────────────────────────────────────────